4 


DTK  FILE  COPJf 


•  t 


AD-A181  328 


SIFICADON  /  DOWN>jRA 


4.  PERFORMING  ORGANIZATION  REPORT  NUMSER(S) 


rn 


Technical  Report  #2 


NAME  Of  PERFORMING  ORGANIZATION 
University  of  Pennsylvania 


Li)0CUMENTAT10N  PAGE 


lb.  RESTRICTIVE  MARKING 


3.  DISTRIBUTION  /  AVAILABILITY  OF,  REPORT 
Approved  for  public  release; 

Distribution  Unlimited 


S.  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 


6b.  OFFICE  SYMBOL  I  7a  NAME  OF  MONITORING  ORGANIZATION 
(It  apfilkabh) 


Be  ADDRESS  (Gffy,  Stale,  and  ZIP  Coda) 

3815  Walnut  Street 

Philadelphia,  Pennsylvania  19104-6196 


7b  ADDRESS  (City.  Slat*,  and  ZIP  Code)  . 
Arlington,  Virginia  22217-5000 


Sa.  NAME  OF  FUNDING /SPONSORING 
ORGANIZATION 


Be.  ADORE  SS  (City.  Stata.and  ZIP  Cod*) 


8b.  OFFICE  SYMBOL  9.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 
Of  appiieabla) 

N00014-85-K-0643 


10.  SOURCE  OF  FUNDING  NUMBERS 


PROGRAM  I  PROJECT 
ELEMENT  NO  I  NO. 


61153N 


RR04204  IRR04206-01 


11  TITLE  (Intluda  Security  Classification) 

Steps  Toward  anEmpirical  Evaluation  of  Robust  Regression  Applied  to  Reaction-Time  Data 


12.  PERSONAL  AUTHORJS) 

Sternberg,  Saul;  Turock,  David  L . (AT&T  Bell  Labs);  Knoll,  Ronald  L.  (AT&T  Bell  Labs) 


13a.  JYPE  Of  REPORT 

Technical  Report 


U1  $8(>E November  (y>"' Month' 0ty)  f5  PAGf3COuNT 


16.  SUPPLEMENTARY  NOTATION 


Work  Collaborative  with  AT&T  Bell  Laboratories 


16.  SUBJECT  TERMS  (Continue  on  reverse  it  necessary  and  identity  by  Mock  number) 
GROUP  I  SUB-GROUP^  ^^p8yCh0i0gy f  Perception,  Visual,  Statistics,  Mathematics, 

Reaction-rtime,  Memory  „ 


19.  ABSTRACT  (Continue  on  reverse  if  necessary  and  identify  by  Mock  numbar) 


COSATl  COOES 


Current  statisticaltheory  provides  little  useful  guidance  about  how  to  reduce  the  sensitivity  of 
analyses  of  reset  longtime  data  to  aberrant  observations  and  violations  of  statistical  assumptions 
either  because  robust  methods,  which  are  very  inviting,  are  insufficiently  understood,  or  because 
aspects  of  the  data  are  not  characterized  fully  enough  to  permit  devising  theoretically  justifiable 
analyses.  In  this  report  we  suggest  an  empirical  approach,  in  which  one  applies  the  same  criteria 
to  the  problem  of  selecting  a  statistical  method  at  those  that  one  uses  to  select  among 

alternative  experimental  procedures.  We  define  six  such  criteria,  and  then  describe  five  tests 
based  on  a  set  of  about  36,000  observations  in  which  we  compare  ordinary  least-squares  multiple 
regression  as  a  way  of  characterizing  the  data  with  Huber’s  robust  iteratively  reweighted  least- 
squares  method.  Results  favor  the  robust  method.  fCc  u  if  i.  /s  * 


20  DISTRIBUTION  /  AVAILABILITY  OF  ABSTRACT 
(SUNCLASSIFIEO/UNUMITID  □  SAME  AS  RPT 


22a  NANIS  OF  RESPONSIBLE  INDIVIDUAL 
Dr.  Harold  Hawkins 


DO  FORM  1473,64  mar 


ACT  121  ABSTRACT  SECURITY  CLASSIFICATION 

AS  RPT  □  OTIC  USERS  l  _  _  _ 


22b.  TE.EPHONE  (Includa  Are*  Coda)  22c.  OFFICE  SYMBOL 
202-696-4323  ONR1142PT 


B3  APR  edition  may  be  used  until  exhausted.  SECURITY  CLASSIFICATION  OF  this 

All  other  editions  are  obsolete.  " 

A 


Steps  Toward  an  Empirical  Evaluation  of 
Robust  Regression  Applied  to  Reaction-Time  Data1 


Saul  Sternberg  / 

University  of  Pennsylvania 
AT&T  Bell  Laboratories , 

David  L.  Turock 
AT&T  Bell  Laboratories, 

and 

■ 

Ronald  L.  Knoll 
AT&T  Bell  Laboratories. 


Accesion  For 

NTfS  CRA&I 
DTIC  TAB 

Unan,'io;ir;cod 

JiJStltlCdtiO.'l 


By . 

Distiibution  / 


Availability  Codes 

Avail  .i-.d/or 
Dist  Special 


1.  Supported  in  part  by  Contract  N00014-85-K-0643 
between  the  Office  of  Navil  Research  and  the  University 
of  Pennsylvania. 

MyiKMtlN  la  otoli  w  Hit  la  »oraitt#4  <w  aay  pwhm  of  tba 
Dalto4  IUtM  IWIHUt. 


Ajptww 4  faa  poalla  rolaaaai  llnflMlt*  aallalta*. 


MNtrt  la  put  wr  tM  Mraaaaal  aa4  Tralalaf  taaaarak  fr«fiui, 
rayyahalafiaal  Mlaaaaa  otvialaa.  0«1h  *»  l»nl  Uitutk,  judi 
Coatiut  M.VSOOM-tl-K-CMl,  Coa tract  MtMtltjp  Maatlffcfcirta 
IMMI  «*  1M-II1/MMJ. 


07  0 


Abstract 


* 


Current  statistical  theory  provides  little  useful  guidance  about  how  to  reduce 
the  sensitivity  of  analyses  of  reaction-time  data  to  aberrant  observations  and 
violations  of  statistical  assumptions,  either  because  robust  methods,  which  are 
very  inviting,  are  insufficiently  understood,  or  because  aspects  of  the  data  are 
not  characterized  fully  enough  to  permit  devising  theoretically  justifiable 
analyses.  In  this  report  we  suggest  an  empirical  approach,  in  which  one  applies 
the  same  criteria  to  the  problem  of  selecting  a  statistical  method  as  those  that 
one  uses  to  select  among  alternative  experimental  procedures.  We  define  six 
such  criteria,  and  then  describe  five  tests  based  on  a  set  of  about  36,000 
observations  in  which  we  compare  ordinary  least-squares  multiple  regression  as  a 
way  of  characterizing  the  data  with  Huber’s  robust  iteratively  reweighted  least- 
squares  method.  Results  favor  the  robust  method. 


1.  Introduction 

During  the  past  two  decades  there  has  been  a  vast  growth  in  the  use  of 
reaction-time  methods  in  psychological  research,  largely  because  psychologists 
have  recognized  that  the  analysis  of  reaction-time  data  can  lead  to  powerful 
inferences  about  the  structure  of  mental  processes.  However,  the  problem  of 
contamination  of  RT  distributions  by  aberrant  observations  is  one  that  has  not 
been  solved  either  experimentally  or  analytically,  and  as  inferences  become  more 
subtle,  even  small  distortions  in  the  data  take  on  added  importance. 

To  our  knowledge  the  properties  that  have  been  established  for  all  of  the 
existing  statistical  methods  of  outlier  elimination  depend  on  assumptions  that 
are  either  known  to  be  false  or  are  difficult  to  validate  in  most  reaction-time 
data.  (We  assume  that  subjective  methods  of  data  trimming,  although  used, 
should  be  regarded  as  temporary  expedients  only,  and  we  ignore  those  fortunate 
situations  in  which  secondary  concomitant  observations  are  available  to  serve  as 
a  basis  for  data  exclusion.)  One  commonly  required  property,  for  example,  is 
that  the  form  of  the  distribution  is  the  same  for  the  observations  in  each  cell  of 
an  experimental  design;  that  is,  distributions  associated  with  different  cells 
differ  at  most  in  scale  and  location  parameters.  A  priori  this  is  unlikely  to  be 
true,  given  the  factors  whose  levels  typically  vary  from  cell  to  cell  and  what  we 
know  about  their  effects  on  RT,  and  empirically  the  typical  sample  sizes  per  cell 
needed  to  validate  it  are  difficult  to  achieve.  Other  unpalatable  assumptions 
that  are  sometimes  required  include  symmetry  and  sometimes  normality  of 
either  the  underlying  distribution  or  the  contamination  distribution;  both  of 
these  are  likely  to  be  false  for  reaction-time  data,  which  are  typically  skewed 
toward  large  values. 

Methods  that  trim  data  or  weight  them  differentially  have  often  been  shown 
to  lead  to  parameter  estimates  that  are  more  efficient  (smaller  mean  squared 
error)  than  others,  but  these  estimates  may  be  biased  in  small  samples  to  an 
extent  that  varies  in  unknown  ways  across  levels  of  experimental  factors,  or 
across  "conditions".  Moreover,  bias  is  especially  troublesome  relative  to 
inefficiency  when  precise  quantitative  models  are  being  tested,  or  when 
quantitative  aspects  of  the  data,  such  as  additivity  of  effects  in  factorial  designs 
or  linearity  of  the  effects  of  quantitative  factors,  are  of  interest.  The 
importance  of  additivity  and  linearity  also  argues  against  the  use  of  nonlinear 
transformations  of  reaction  times,  which  is  sometimes  proposed  as  a  way  to 
increase  the  likelihood  that  the  data  satisfy  assumptions  such  as  those 
mentioned  above  and  to  reduce  the  effects  of  skewness  and  contamination.2 
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In  their  book  on  applied  regression  analysis,  Draper  and  Smith  (1981,  p.  344) 
express  a  conservative  and  idealistic  view  about  the  use  of  robust  regression: 

"We  believe  use  of  robust  regression  methods  is  inadvisable  at  the  present  time, 
unless  rules  for  deciding  which  robust  method  to  use  in  which  circumstances 
have  been  formulated  and  proved  effective.  If  the  model  (which  includes  the 
assumptions  about  the  error  distribution)  is  wrong,  the  appropriate  action  to 
take  is  to  change  the  model  and  use  maximum  likelihood  estimation  on  the  new 
model,  not  to  change  the  method  of  estimation." 

The  principal  difficulty  with  the  approach  advocated  by  Draper  and  Smith  is 
that  the  effort  needed  in  the  context  of  each  experimental  problem  to  collect 
and  analyze  the  data  required  to  carry  it  out  is  considerable,  and  unlikely  to  be 
attempted  by  any  but  the  most  patient  and  courageous  scientists. 

The  experimentalist  who  uses  reaction-time  measures,  then,  is  in  a  quandary, 
concerned  on  the  one  hand  that  contamination  and  other  assumption  failures 
might  influence  conclusions  based  on  conventional  (ordinary  least  squares) 
methods  (which  are  well-understood  theoretically),  but  afraid  on  the  other 
hand  that  the  techniques  available  to  limit  the  effects  of  such  contamination 
(which  are  less  well  understood  theoretically)  may  introduce  other  difficulties. 


2.  An  Empirical  Approach 

One  alternative  to  choosing  a  method  arbitrarily  and  also  to  engaging  in  the 
theoretical  analysis  advocated  by  Draper  and  Smith  is  the  empirical  approach 
that  we  consider  in  the  present  report.  It  involves  choosing  a  data  analysis 
method  in  the  same  way  as  we  choose  an  experimental  procedure.  Roughly 
speaking,  we  propose  that  a  legitimate  acceptance  criterion  for  either  an 


2.  Additivity  of  factor  effects  is  important  in  relation  to  the  idea  that  some  mental  processes 
are  organised  in  stages  that  can  be  selectively  influenced  by  experimental  factors;  linearity 
arises  if  a  mental  process  includes  a  sequence  of  operations  whose  number  reflects  the 
number  of  task  components  to  be  accomplished,  and  whose  mean  duration  is  independent  of 
this  number.  An  example  of  additivity  that  pervades  many  reaction-time  studies  involves 
the  individual  differences  in  mean  RT  in  experiments  in  which  sensory,  perceptual,  or 
mnemonic  factors  are  manipulated;  a  substantial  part  of  this  variation  appears  to  be 
associated  with  decision  or  motor  processes  that  are  not  systematically  influenced  by  these 
factors,  processes  that  may  occur  after  the  completion  of  those  operations  that  such  factors 
do  influence.  Insofar  as  such  effects  when  measured  in  units  of  physical  time  are  indeed 
additive,  such  that  subject  effects  can  be  "removed’'  as  in  the  analysis  of  variance,  any 
nonlinear  transformation  would  render  them  non-removable,  and  they  would  instead  enter 
into  the  effects  of  interest,  biasing  the  estimated  mean  values  of  such  effects,  and  making 
them  appear  more  subject  to  individual  differences  than  they  "really”  are. 


Robust  regression  &  reaction  time 


-3- 


Sternberg,  Turock,  &  Knoll 


experimental  procedure  or  an  analysis  method  is  whether  the  results  it  produces 
tend  to  be  orderly  and  to  make  sense. 


Whether  a  particular  description  or  characterization  of  a  set  of  data  "makes 
sense"  can  probably  be  decided  only  years  after  the  data  have  been  collected; 
"making  sense"  here  means  being  consistent  with  related  findings  and  relevant 
theory.  Orderliness,  however,  seems  to  be  a  quality  that  we  can  judge 
reasonably  well  without  using  a  distant  vantage  point.  For  this  purpose,  any 
measure  of  order  applied  to  the  data  description  produced  by  a  particular 
method  must,  of  course,  be  known  not  to  have  been  forced  on  the  data  by  the 
method.  Some  examples  should  help  to  clarify  our  notion  of  order: 

2.1  Invariance  of  the  estimated  effects  of  one  factor  (Factor  2)  over  levels  of 
another  (Factor  1) 

Suppose  that  separate  analyses  are  conducted  for  each  level  of  an 
experimental  factor  (Factor  1).  Then  insofar  as  the  estimated  effect  of  a  second 
factor  (Factor  2)  is  closer  to  being  identical  in  each  of  these  analyses  (i.e.,  the 
effects  of  the  two  factors  are  additive),  and  there  is  no  reason  to  expect  an 
interaction,  that  analysis  method  is  to  be  preferred.  The  measure  of  similarity 
(or  divergence)  of  the  effects  of  F actor  2  across  levels  of  Factor  1  should 
probably  include  an  adjustment  for  difference  in  mean  effect  size  (e.g.,  the 
mean  effect  of  Factor  2  over  levels  of  Factor  1)  across  methods.  There  are  two 
senses  in  which  the  estimated  effects  of  Factor  2  might  vary  in  similarity  across 
levels  of  Factor  1:  they  might  differ  systematically,  or  nonsystematically.  In  both 
cases  the  argument  is  based  on  a  principle  of  parsimony,  or  the  assumption  that 
"nature"  or  "truth"  is  simple.  Results  indicating  less  nonsystematic  variation 
are  therefore  closer  to  the  "truth,"  as  are  results  that  indicate  invariance  rather 
than  systematic  effects. 

2.2  Minimum  variability  over  individual  subjects 

Suppose  that  a  separate  analysis  is  used  for  each  subject’s  data,  and  we 
derive  a  measure  of  the  effect  of  some  experimental  factor  on  performance. 
Suppose  further  that  the  analysis  imposes  no  constraint  on  the  size  of  this 
effect.  Then  the  results  are  more  orderly  insofar  as  the  variation  across  subjects 
in  the  size  of  the  effect  is  smaller.  (As  the  measure  of  variability  it  may  be 
appropriate  to  use  a  coefficient  of  variation  —  i.e.,  dispersion  as  a  proportion 
of  mean  effect  size  —  to  compensate  for  any  differences  in  scale  associated 
with  different  methods.)  Note  that  this  criterion  can  be  regarded  as  an  instance 
of  the  first,  where  Factor  1  now  represents  a  "random  effect"  associated  with 
subjects.  The  argument  here  depends  on  the  idea  that  the  measured  variation 
reflects  a  combination  of  "true"  variability  across  subjects  plus  "nontrue" 
variability  that  derives  from  measurement  error  or  other  sources  of  "random" 
influences.  Since  the  analysis  places  no  constraint  on  the  estimated  size  of  the 
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effect,  and  separate  analyses  are  performed  for  separate  subjects,  a  method 
that  produces  less  variation  'in  effect  size  must  be  less  sensitive  to  these  other 
sources,  and  must  thus  give  values  closer  to  the  "truth". 

2.3  Minimum  residual  variability 

More  generally,  any  indication  that  a  method  produces  results  that  are  less 
sensitive  to  random  variation  (relative  to  true  variation)  should  cause  us  to 
favor  that  method. 

2.4  Linearity  of  the  effects  of  quantitative  factors 

Given  that  the  methods  under  consideration  do  not  force  linearity  on  the 
effect  of  a  quantitative  factor,  the  method  that  produces  the  best  linearity  is  to 
be  preferred.  The  argument  is  again  based  on  a  principle  of  parsimony,  as  in 
the  case  of  the  first  criterion,  together  with  the  idea  that  except  for  a  null  effect 
of  a  factor,  the  simplest  effect  is  a  linear  one.  Because  the  form  of  a  functional 
relationship  depends  on  the  scales  of  measurement  of  both  the  factor  and  the 
response,  this  criterion  can  only  be  applied  if  there  is  some  basis  for  choosing  a 
measurement  scale  in  each  case.  Note  that  linearity  can  be  regarded  as  another 
instance  of  invariance,  or  additivity:  the  effect  of  a  quantitative  factor  is  linear 
insofar  as  the  effect  of  a  fixed-size  increment  in  the  factor  is  invariant  across 
different  initial  levels. 

2.5  Similarity  of  small-sample  and  large-sample  characterisations  of  data 

Suppose  that  data  are  partitioned,  by  subject  for  example,  or  by  level  of  a 
treatment  factor,  and  we  perform  separate  analyses  of  the  data  subsets  as  well 
as  analysis  of  their  union.  Then  that  method  is  better  whose  characterizations  of 
the  subsets  are  more  similar  to  its  characterization  of  the  union.  Because  this 
property  will  generally  be  associated  with  the  characterizations  of  the  subsets 
being  similar  to  each  other,  it  can  be  regarded  as  another  instance  of  the  first. 

2.0  Clarity  of  choice  among  models 

We  often  attempt  to  use  our  data  to  select  among  alternative  theories.  In 
the  present  context  a  theory  is  tested  by  representing  it  as  a  regression  model, 
fitting  it  to  the  data,  and  measuring  goodness  of  fit.  The  theory  that 
corresponds  to  the  best-fitting  model  is  then  selected.  A  method  is  good  insofar 
as  this  test  leads  to  the  same  conclusion  for  different  data  sets. 


In  the  sections  that  follow  we  discuss  the  background  for  tests  using  the 
first,  third,  fourth,  and  sixth  of  the  above  criteria  that  we  have  applied  in  a 
comparison  of  a  particular  robust  regression  method  with  ordinary  least  squares. 
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3.  Source  of  the  data  used  for  these  tests:  Experiments  on  the  transformation 

of  short-term  visual  memory  assessed  by  retrieval-time  measurement 

The  data  to  which  we  applied  the  comparative  analysis  were  generated  in  an 
experiment  in  which  we  measured  the  time  to  name  a  specified  digit  in  a 
horizontal  row  of  from  2  to  6  briefly-displayed  digits.  The  target  was  specified 
by  means  of  a  "marker"  stimulus  that  denoted  its  spatial  location;  In  different 
blocks  of  trials  we  presented  the  marker  at  one  of  six  different  times  relative  to 
the  onset  of  the  array  display,  ranging  from  350  msec  before  to  650  msec  after 
the  onset,  and  denoted  -350  msec,  .  .  .  ,  +650  msec.  In  some  blocks  of  trials  the 
marker  was  visual:  a  50-msec  display  of  a  two  vertical  line-segments,  one  above 
and  one  below  the  target’s  location.  In  other  trial  blocks  the  marker  was  tactile: 
a  50-msec  vibration  applied  to  one  of  six  fingers;  subjects  had  learned  the 
correspondence  between  six  fingertips  and  the  six  possible  positions  that  defined 
the  display  area  and  that  could  be  occupied  by  digits.  Combination  of  the  six 
marker  delays  and  two  marker  modalities  defined  twelve  conditions,  in  each  of 
which  we  could  examine  the  effect  of  array  size  on  mean  reaction-time.  Each  of 
six  subjects  had  14  hours  of  practice,  followed  by  22  hours  of  testing;  the  result 
was  a  data  set  for  each  condition  and  each  subject  that  contained  about  500 
observations.  Each  subject  provided  us  with  twelve  such  data  sets;  the 
experiment  therefore  produced  72  such  data  sets,  which  we  treated  separately  in 
our  regression  analyses. 

At  the  start  of  a  trial,  subjects  fixated  in  the  center  of  the  six-position 
display  area.  To  avoid  confounding  the  principal  experimental  factor  —  the 
number  of  displayed  elements  ( array  size)  —  with  their  separation,  we  placed 
elements  in  contiguous  locations.  To  reduce  (but  not  eliminate)  the 
confounding  of  array  size  with  retinal  eccentricity  of  the  possible  target 
elements  (and  hence  of  the  possible  marker  locations),  we  placed  the  arrays  at 
all  possible  positions  within  the  display  area:  An  array  of  size  s  could  be  placed 
in  any  of  7—  s  such  positions. 

In  assessing  the  effect  of  array  size  (our  principal  goal)  one  has  to  consider  a 
number  of  other  factors  that  are  also  known  to  influence  performance.  The 
reasons  they  must  be  considered  explicitly  are  twofold:  In  some  cases  we  select 
their  levels  randomly  because  the  experiment  is  too  small  to  permit  complete 
balancing;  in  other  cases  an  orthogonal  design  is  impossible  and  there  is  some 
degree  of  inherent  confounding.  Indeed,  it  is  for  these  reasons  that  an  explicit 
multiple  regression  method  is  necessary,  rather  than  a  more  standard  analysis 
suitable  for  multiway  tables.  3 
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The  other  main  factors  are  as  follows:  First,  there  is  the  serial  position  of  the 
target  element  within  the  array.  Because  the  number  of  serial  positions  changes 
from  one  array  size  to  another,  and  because  we  cannot  specify  a  correspondence 
between  positions  in  arrays  that  differ  in  size,  the  serial  position  effect  is 
regarded  as  a  separate  effect  for  each  array  size.  Second,  there  is  the  position  of 
the  target  (and  hence  of  the  marker)  within  the  display  area:  its  absolute 
position.  Our  findings  have  forced  our  model  of  the  absolute  position  effect  to 
be  somewhat  complicated:  the  effect  seems  to  be  systematically  smaller  for 
targets  that  are  the  leftmost  or  rightmost  elements  of  an  array  —  end  elements 
—  than  for  interior  elements.  This  generates  a  complicatedTnteraction 
between  serial  position  and  absolute  position,  which  is  embedded  in  the 
multiple  regression  model  that  we  fit  to  the  data.  The  third  main  factor  is  the 
identity  of  the  target  element  (one  of  the  ten  digits,  in  the  present  experiment) 
that  must  be  identified  and  named.  Some  target  elements  are  associated  with 
longer  reaction  times  than  others,  possibly  because  of  identification-time 
differences,  or  differences  in  naming  latency  given  the  identity,  or  differences  in 
measurement  delay  of  our  speech-onset  detector  possibly  related  to  the  initial 
sound  of  the  spoken  name. 

In  other  experiments  using  the  present  paradigm  the  effect  of  array  size  on 
mean  reaction-time  has  been  closely  approximated  by  a  linear  function,  and  this 
has  been  the  case  for  all  the  delays  studied.  The  aspect  of  our  findings  that  we 
regard  of  most  importance  is  the  change  in  the  parameters  of  this  linear 
function  as  the  marker  is  delayed:  When  the  marker  shortly  precedes  or  is 
simultaneous  with  the  array,  the  slope  of  the  linear  function  is  very  close  to 
zero:  i.e.,  there  is  essentially  no  effect  of  array  size.  As  the  marker  is  delayed, 
the  slope  grows  systematically,  reaching  what  appears  to  be  an  asymptote  at  a 
delay  of  about  a  second;  most  of  the  change  has  occurred  within  about  two 
thirds  of  a  second.  Together  with  results  from  other  paradigms  that  we  have 
used,  these  findings  indicate  to  us  the  existence  of  a  rapid  and  dramatic 
transformation  of  the  internal  representation  of  the  visual  display,  such  that  the 
initial  representation  manifests  a  property  of  direct  access  by  spatial  location, 
and  that  this  property  is  eliminated  as  the  transformation  proceeds. 

For  a  synopsis  of  earlier  experiments  using  a  visual  marker,  as  well  as  the 
present  experiment  in  which  tactile  and  visual  markers  are  compared,  see 
Sternberg,  Knoll,  8c  Turock,  1985.  For  a  detailed  account  of  findings  from  the 


3.  Note,  however,  that  robust  alternatives  to  the  analysis  of  orthogonal  experiments  are  also  of 
great  interest,  and  could  be  assessed  in  ways  similar  to  those  exemplified  in  the  present 
report. 
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earlier  experiments,  as  well  as  a  summary  of  results  from  three  other  paradigms, 
also  aimed  at  investigating  the  first  second  of  visual  memory,  see  Turock,  1985, 
and  Sternberg,  Knoll,  &  Turock,  1986. 

One  aim  of  our  analyses,  then,  is  to  permit  an  assessment  of  linearity  and, 
given  linearity,  to  characterize  the  obtained  linear  functions  by  intercept  and 
slope  parameters. 

4.  The  Regression  Model 

The  regression  model  can  be  expressed  as  follows,  where  RT  denotes  mean 
reaction-time: 

RT  “  n  4-  ott  +  0ai  +  ip  +  S’. 

The  {a,},  which  represent  the  array-size  effect,  are  defined  for  s  =  2,  3,  4,  5, 
and  6.  They  are  constrained  to  sum  to  zero,  and  thus  reflect  4  degrees  of 
freedom  (df).  The  {/?«},  which  represent  the  serial-position  effect,  are  defined 
for  each  s,  and  for  i  =  1,  2,  .  .  ,  s;  for  each  s-value  they  are  constrained  to  sum 
to  zero,  and  thus  reflect  15  df.  The  {l^},  represent  the  two  assumed  absolute- 
position  effects,  one  for  end  elements,  defined  for  p  *»  1,  2,  3,  4,  5,  and  6,  and 
constrained  to  sum  to  zero,  and  the  other  for  interior  elements,  defined  for 
p  *  2',  3',  4',  and  5',  and  constrained  separately  to  sum  to  zero.  The  absolute- 
position  effects  therefore  reflect  8  df.  Finally,  the  {£<<}  represent  the  element- 
identity  effect  and  are  defined  for  d  —  0,  1,  .  .  .  ,  8,  and  9,  and  constrained  to 
sum  to  zero,  thus  reflecting  9  df.  The  number  of  degrees  of  freedom  in  the 
model  being  fitted  to  the  data  is  therefore  39. 


5.  The  two  regression  methods 

We  applied  each  regression  method  to  each  of  the  72  data  sets  discussed 
above.  The  vector  of  observations  in  each  data  set  contained  about  500  entries, 
while  the  parameter  vector  contained  39.  Both  regression  methods  are  available 
as  functions  within  the  S  language  (Becker  &  Chambers,  1984).  The  first 
method  was  ordinary  least  squares  regression,  called  reg  within  S.  The  second 
method  was  robust  regression,  or  iteratively  reweighted  least  squares,  and  called 
rreg  within  S.  We  used  the  variant  of  robust  regression  that  embodied  the 
Huber  weighting  function  with  constant  1.345.  For  readers  unfamiliar  with  this 
method  we  provide  a  brief  description.  (For  details  see  Coleman,  Holland, 
Kaden,  Klema,  &  Peters,  1980.) 

Suppose  we  have  a  set  of  parameter  estimates;  the  first  such  set  might  be 
obtained  by  ordinary  least-squares  regression;  later  sets  are  obtained  by 
iteration.  Let  r*  be  the  set  of  residuals  obtained  by  fitting  the  model  with  a 
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particular  set  of  parameter  estimates  to  our  data  set  (of  approximately  500 
observations).  Define  u  as  a  scale  parameter  given  by  the  median  absolute 
residual  divided  by  0.6745.  Now  define  a  weight,  tv,-,  as  follows: 


if  |  r*  |  <  1.345u,  then  u/*  =  1; 


if  |  rk  J  >  1.345it,  then  tu* 


( 


1.345v 


Tk 


1/2 


Thus,  all  observations  contribute  to  the  solution,  and  observations  that  differ 
from  the  model  prediction  based  on  the  last  set  of  parameter  estimates  by  no 
more  than  1.345  estimated  scale  units  are  given  unit  weight,  whereas 
observations  that  differ  by  more  than  that  amount  are  increasingly 
downweighted  The  iterated  set  of  parameter  estimates  is  now  obtained  by 
minimizing  the  sum  of  squared  weighted  residuals,  where  each  residual  rt  is 
weighted  by  w*.  The  iteration  continues  until  a  criterion  of  convergence  is  met. 


6.  First  test:  Extent  of  invariance  across  probe  delays  in  the  effect  of  target 

identity 

The  values  of  the  parameters  represent  the  effect  of  target  identity,  and 
do  in  fact  differ  reliably  from  digit  to  digit.  To  convey  the  magnitude  of  this 
effect,  we  determined  the  mean  value  of  each  of  the  ten  parameter  estimates 
over  the  six  delays  for  each  subject  and  each  modality,  and  determined  the 
range  of  the  resulting  ten  means.  We  then  determined  the  mean  of  this  range 
over  the  six  subjects  for  each  modality,  with  the  results  shown  separately  in 
Table  1  for  ordinary  and  robust  regression. 
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Table  1 

Range  over  Target  Elements  of  Mean  Target-Element  Parameter  6rf. 
(Mean  taken  over  six  delays.) 


Modality: 

Visual 

Tactile 

Regression: 

Ordinary 

Robust 

Ordinary 

Robust 

Subject: 

1 

59.1 

62.7 

69.4 

69.9 

2 

47.4 

41.2 

65.7 

54.2 

3 

85.7 

59.7 

78.8 

80.9 

4 

37.7 

34.5 

79.0 

74.6 

5 

63.5 

67.0 

74.4 

75.8 

6 

53.8 

58.1 

55.1 

55.0 

Mean  Range: 

54.5 

53.5 

70.4 

68.4 

The  range  of  target-element  parameters  over  target  elements  is  clearly 
affected  to  a  negligible  extent,  if  at  all,  by  the  choice  of  regression  method.  This 
means  that  variability  estimates  can  be  compared  across  methods  without  being 
adjusted  for  scale  differences.  Note,  however,  that  there  is  a  clear  difference 
between  ranges  for  the  two  modalities.  We  believe  that  this  reflects  greater 
unreliability  of  the  data,  and  therefore  of  estimates  based  on  those  data,  for 
tactile  than  visual  probes;  evidence  that  favors  this  interpretation  will  be 
presented  below. 

Now  we  proceed  to  the  test  of  invariance  of  digit  effects  over  probe  delays. 
Such  invariance  is,  of  course,  an  example  of  a  criterion  of  the  first  type 
discussed  in  Section  2,  and  we  need  to  consider  why  we  might  either  expect  or 
not  expect  an  interaction.  During  the  time  interval  between  the  probe  and 
detection  of  the  subject’s  response  there  seem  to  be  at  least  four  processes  that 
are  plausible  loci  of  target-identity  effects.  Recall  that  the  identity  of  the  target 
specifies  the  (correct)  spoken  response  (its  name).  The  first  plausible  locus  is  in 
the  process  of  deriving  the  identity  of  the  target  from  its  internal 
representation.  The  second  is  in  the  process  of  "computing"  and  preparing  the 
vocalization  of  the  name  from  the  derived  identity.  The  third  is  initiating  the 
vocalization.  The  fourth  locus  is  external  to  the  subject,  but  indubitably  a 
contributor  to  the  measured  reaction  time:  the  process  in  our  hardware  and 
software  of  detecting  the  onset  of  speech.  Because  speech-onset  detectors  are 
imperfect,  responding  with  different  speeds  to  different  initial  sounds,  this 
process  shares  the  property  of  the  other  three  of  being  a  potential  locus  of  a 
target-identity  effect. 
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Once  the  identity  has  been  derived  there  appears  to  be  no  basis  for  any 
interaction  between  marker  delay  and  target  identity:  we  would  expect  any 
target  identity  effects  in  the  second,  third,  and  fourth  of  the  processes  just 
mentioned  to  be  invariant  across  delays. 

We  now  turn  to  the  first  of  these  hypothesized  processes.  Here  there  would 
seem  to  be  a  possible  source  of  an  interaction  between  marker  delay  and  target 
identity,  as  well  as  of  a  marker-identity  effect.  The  state  of  the  representation 
from  which  the  identity  is  derived  presumably  depends  on  marker  delay. 
Suppose  that  R lt  a  "raw"  visual  representation  is  transformed  during  the  delay 
into  R 2,  perhaps  a  more  abstract  representation  and  one  from  which  it  takes 
less  time  to  derive  the  identity.  The  transformation  thus  facilitates  the  process 
of  deriving  the  identity.  It  is  possible  that  the  magnitude  of  this  facilitation 
would  be  different  for  different  targets;  if  so,  the  target-identity  effect  should 
change  with  —  that  is,  interact  with  —  probe  delay. 

This  argument  does  not  lead  to  a  strong  expectation  of  an  interaction 
because  the  existence  of  a  target-identity  effect  does  not  require  that  any  of  it 
be  localized  in  this  first  process,  which  is  the  only  plausible  locus  among  the 
four  processes  mentioned  for  an  interaction.  Furthermore,  as  argued  in  Section 
2,  if  such  an  interaction  existed,  there  is  no  reason  why  it  should  be 
systematically  obscured  by  the  analysis  since  the  regression  method  was  applied 
independently  to  data  from  different  marker  delays.  Hence  if  one  method 
produces  a  closer  approximation  to  invariance  than  another,  it  is  to  be 
preferred. 

To  assess  the  extent  of  invariance  over  probe  delays  in  the  parameter 
estimates  provided  by  each  regression  method  we  determined,  for  each  subject 
and  modality,  and  for  each  of  the  ten  target  elements  (digits),  d ,  the  variance 
over  the  six  delays  of  the  estimate  64.  We  then  averaged  these  variances  over 
target  elements;  results  are  displayed  in  Table  2.  The  range  analysis  (see  Table 
1)  indicates  that  there  is  no  need  to  scale  the  variances  differentially  for  the  two 
regression  methods. 
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Table  2 

Variance  of  target-element  parameters  over  six  probe  delays  (msec2). 
(Quantities  shown  are  mean  variances,  where  mean  is  taken 
over  target  elements.) 


Modality: 

Visual 

Tactile 

Regression: 

Ordinary 

Robust 

Ordinary  Robust 

Subject: 

1 

314 

154 

248 

187 

2 

403 

292 

591 

344 

3 

330 

278 

751 

474 

4 

437 

180 

990 

390 

5 

231 

153 

757 

312 

6 

432 

246 

727 

279 

Mean  Variance: 

358 

217 

677 

331 

Note  first  that  in  all  of  the  twelve  comparisons,  of  which  there  are  two  per 
subject,  robust  regression  provides  a  reduction  in  variance,  such  that  the  mean 
variances  over  subjects  reveal  a  30%  reduction  for  visual  markers,  and  a  51% 
reduction  for  tactile  markers.  The  difference  between  regression  methods  is 
statistically  significant  (p  <  .01)  for  both  marker  modalities.  This  finding 
argues  strongly  in  favor  of  the  robust  method. 


7.  Second  teat:  Extent  of  invariance  across  probe  modalities  in  the  effect  of 

target  identity 

As  discussed  above,  we  can  conceive  of  a  possible  basis  for  a  systematic 
effect  of  the  delay  of  the  probe  on  target-element  parameters.  On  the  other 
hand,  we  can  think  of  no  reason  to  expect  differences  between  target  digits  to 
depend  on  the  modality  of  the  probe.  Indeed,  the  remarkable  similarity  in  the 
effect  of  delay  on  the  slope  of  the  function  relating  mean  reaction-time  to  array 
size  is  consistent  with  the  idea  that  once  the  critical  location  has  been 
ascertained  on  the  basis  of  information  provided  by  the  probe,  the  remaining 
processes,  leading  up  to  identifying  and  orally  naming  the  element  in  that 
critical  location,  are  the  same,  regardless  of  the  modality  by  which  the  location 
information  was  conveyed.  It  follows  that  we  expect  to  find  invariance  of  the 
target-element  parameters  over  modalities,  and  that  insofar  as  a  method  reveals 
greater  invariance,  it  is  to  be  preferred. 
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To  assess  the  extent  of  the  invariance  over  modalities  we  proceeded  as 
follows:  For  each  target  element  and  each  delay  we  determined  the  variance  of 
the  (two)  target-element  parameters  over  modalities,  by  calculating  half  of  their 
squared  difference.  For  each  subject  and  each  regression  type  we  then  averaged 
this  quantity  over  the  six  delays  and  ten  target  elements.  The  results  are 
displayed  in  Table  3. 

Table  3 

Variance  of  target-element  parameters  over 
two  modalities  (msec2).  (Quantities  shown 
are  mean  variances,  where  mean  is  taken 
over  target  elements  and  delays.) 


Regression: 

Ordinary 

Robust 

Subject: 

1 

247.8 

122.8 

2 

540.1 

364.7 

3 

720.1 

436.5 

4 

930.9 

343.7 

5 

417.6 

166.9 

6 

573.6 

224.7 

Mean  Variance: 

571.7 

276.6 

In  all  six  comparisons,  robust  regression  provides  a  reduction  in  variance, 
with  a  mean  ratio  of  0.49  ±  0.05.  Like  the  Finding  in  the  section  above,  this 
additional  application  of  the  first  criterion  discussed  in  Section  2  argues  strongly 
in  favor  of  the  robust  method. 

We  have  thus  found  that  robust  regression  produces  estimates  of  target- 
element  parameters  that  are  more  invariant  over  both  the  delay  and  modality  of 
the  probe  than  does  ordinary  least  squares  regression.  One  possible  “truth" 
that  the  robust  method  therefore  brings  us  closer  to  is  full  invariance  of  these 
parameters  relative  to  delay  and  modality:  in  that  case  any  apparent  failure  of 
invariance  would  reflect  nothing  more  than  the  influence  of  sampling  error  and 
contamination  in  the  basic  reaction-time  data  on  the  estimates  of  target-element 
parameters.  A  hint  in  this  direction  is  the  approximate  equality  of  the  variances 
displayed  in  Tables  2  and  3,  within  regression  type. 

To  test  the  idea  of  full  invariance  we  subjected  the  target-element 
parameters  derived  from  each  of  the  two  regressions  to  an  analysis  of  variance. 
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with  the  factors  Subject,  Modality,  Delay,  and  Target  Element.  To  avoid  effects 
on  the  analysis  of  the  linear  dependence  among  target-element  parameters 
induced  by  the  constraint  that  requires  them  to  sum  to  one,  we  analyzed  only 
the  first  nine  of  the  ten  parameters. 

Values  of  F-statistics  from  analysis  of  the  parameters  derived  from  ordinary 
(robust)  regression  are  as  follows:  In  the  first  three  F-tests,  we  evaluated  a  main 
effect  by  comparing  it  to  its  interaction  with  subjects.  We  obtained,  for  target 
element,  F(8,  40)  *  9.53  (10.61);  for  delay,  F(5,  25)  =»  0.90  (1.33);  and  for 
modality,  F(l,  5)  «  1.18  (0.08).  For  the  fourth  F-test  we  compared  the  subject 
by  target-element  interaction  to  the  four-way  interaction,  which  we  take  as  a 
residual  mean  square;  the  result  is  F(5,  200)  *»  3.54  (6.87).  Finally,  the  residual 
mean  square  for  target-element  parameters  from  ordinary  (robust)  regression  is 
543.4  (264.1). 

These  results  generate  two  clear  implications.  First,  the  values  of  F  for  delay 
and  modality  confirm  the  conjecture  of  full  invariance:  the  observed  variation  of 
parameters  over  delay  and  modality  is  small  enough  relative  to  the  residual  to 
be  entirely  due  to  random  variation.  This  conclusion  strengthens  the  inferences 
above  that  favor  the  method  producing  greater  invariance.  Second,  the  amount 
of  this  random  variation  is  halved  by  moving  from  ordinary  to  robust  regression, 
as  measured  by  the  residual  mean  square.4  This  result  means  that  the  third 
criterion  of  Section  2  also  favors  the  robust  method. 

8.  Third  test:  Deviation  from  linearity  of  the  array-sise  effect 

As  mentioned  in  Section  3,  we  have  found  the  relation  of  RT  to  array  size  for 
data  averaged  over  groups  of  subjects  to  be  remarkably  well-approximated  by  a 
linear  function  at  all  marker  delays,  with  a  slope  that  is  close  to  zero  for 
negative  or  zero  delays  and  that  increases  rapidly  with  positive  marker  delays. 
(Data  from  several  experiments  that  support  this  generalization  can  be  found  in 
Sternberg,  Knoll,  &  Turock,  1985.) 

In  applying  the  regression  analyses  under  consideration  to  our  data  we 
wished  to  assess  the  goodness  of  fit  of  such 'a  linear  function,  so  we  deliberately 
did  not  impose  a  linear  constraint  on  the  parameters  {a4}  that  represent  the 
effect  of  array  size.  We  are  thus  able  to  compare  the  linearity  of  the  estimates  of 
these  parameters  derived  by  means  of  the  two  regression  methods.  Such  a 

4.  This  reduction  in  noise  is  equivalent  to  doubling  the  site  of  the  experiment  which,  for  data 

collection  alone,  would  have  added  about  $7000  to  the  cost  of  the  project. 
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comparison  bears  on  two  of  the  criteria  discussed  in  Section  2:  the  fifth 
(methods  are  preferred  for  which  small-sample  and  large-sample 
characterizations  of  data  are  similar),  and  the  fourth  (methods  are  preferred 
that  produce  better  linearity). 

We  performed  the  comparison  as  follows:  Each  method  provided  a  parameter 
set  {<*«}  for  ®&ch  of  the  72  sets  of  data.  We  fitted  a  line  to  each  set  (by  ordinary 
least  squares  regression),  and  extracted  for  each  parameter  set  the  residual  mean 
square  after  fitting  the  line.  The  square  root  of  this  quantity  is  the  residual 
standard  error  (RSE).  For  each  subject  and  regression  method  we  determined 
the  mean  of  the  resulting  RSE’s  over  the  six  delays.  The  resulting  measures  of 
deviation  from  linearity  are  shown  in  Table  4  for  each  subject  and  marker 
modality,  and  for  each  type  of  regression. 


Table  4 

Measure  of  deviation  from  linearity  of  the  array-size  effect 
{RSE  in  msec),  averaged  over  six  probe  delays. 


Modality: 

Visual 

Tactile 

Regression: 

Ordinary 

Robust 

Ordinary 

Robust 

Subject: 

1 

7.33 

9.83 

12.17 

10.83 

2 

10.33 

10.50 

21.00 

16.17 

3 

11.83 

9.83 

21.83 

19.17 

4 

12.50 

10.83 

22.67 

13.17 

5 

9.17 

7.17 

16.67 

14.33 

8 

7.50 

9.67 

22.17 

11.67 

Mean: 

9.78 

9.64 

19.42 

14.22 

While  there  is  virtually  no  difference  between  the  means  for  visual  markers, 
the  mean  RSE  for  tactile  markers  (considerably  greater)  is  reduced  for  every 
subject  as  we  move  from  ordinary  to  robust  regression,  with  a  mean  reduction  of 
27%,  and  is  brought  closer  to  the  RSE  for  visual  markers.  (Note,  however,  that 
even  the  results  from  robust  regression  differ  significantly  between  modalities, 
with  a  mean  difference  of  4.59  ±  1.35  msec,  so  that  although  the  robust  method 
comes  closer  to  the  desired  invariance,  it  is  not  achieved.)  An  analysis  of 
variance  applied  to  the  mean  RSE’s  showed  that  regression  type,  modality,  and 
their  interaction  were  all  significant  (p  <  .05);  the  effect  of  delay  was  also 
significant,  but  none  of  its  interactions  were. 
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Although  the  advantage  of  robust  over  ordinary  regression  in  this 
comparison  of  array-size  parameters  is  less  dramatic  than  in  the  two 
comparisions  based  on  target-element  parameters,  it  is  nonetheless  present, 
providing  us  with  independent  but  consistent  conclusions. 

9.  Fourth  test:  Extent  of  invariance  of  the  array-site  effect  across  probe 
modalities 

The  mean  array-size  effect  is  estimated  by  the  slope  of  a  line  fitted  to  the 
{a,}  parameters.  Because  the  magnitude  of  this  effect  changes  dramatically 
with  delay,  and  because  we  have  no  principle  to  guide  us  in  aligning  the  delays 
for  probes  in  different  modalities,  it  is  hazardous  to  compare  the  effects  of  array 
size  across  modalities.  Nonetheless,  in  the  mean  data  it  seems  clear  that  by 
assuming  that  physically  equal  delays  for  the  two  probe  modalities  correspond 
psychologically  we  achieve  remarkably  equal  slopes.  (See  Figure  12  in 
Sternberg,  Knoll,  &  Turock,  1985.)  Thus  it  seemed  reasonable  to  expect 
invariance  of  the  array-size  effect,  averaged  over  the  six  delays,  across  probe 
modalities,  and  thus  to  ask  which  type  of  regression  produced  a  closer 
approximation  to  such  invariance.  Results  of  an  analysis  designed  to  provide  an 
answer  to  this  question  are  displayed  in  Table  5. 

Table  5 

Comparison  of  Array-Size  Effect  Across  Modalities. 

Quantities  shown  are  slopes,  B  (in  msec /element),  of  lines  fitted 
to  the  {aa}  for  visual  (By)  and  tactile  (Bj)  probes  (in  msec /element) 
and  their  absolute  differences,  |  By—Bf  | . 


Regression: 

Ordinary 

Robust 

Measure: 

By 

Bj 

\By-~Bx  | 

By 

Bt  1 

By—  B  f 

Subject: 

1 

19.4 

11.3 

8.1 

16.4 

11.3 

5.1 

2 

19.0 

20.2 

1.0 

17.4 

18.3 

0.9 

3 

21.1 

22.8 

1.7 

19.4 

21.6 

2.2 

4 

22.9 

13.7 

9.2 

19.5 

15.7 

3.8 

5 

19.4 

21.3 

1.9 

16.9 

18.2 

1.3 

6 

26.1 

22.8 

3.3 

22.6 

19.4 

3.2 

Mean 

21.3 

18.7 

4.20 

18.7 

17.4 

2.75 

i 
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For  five  of  the  six  subjects  the  robust  method  produces  greater  invariance 
across  modalities  of  the  mean  array-size  effect  over  delays,  by  this  measure.  The 
difference  (1.45  ±  0.93  msec / element)  is  not  statistically  significant,  however,  so 
this  test  is  inconclusive. 


10.  Fifth  test:  Clarity  of  model  comparisons 

To  compare  ordinary  and  robust  regression  with  respect  to  the  clarity  of 
comparisons  of  goodness  of  fit  of  alternative  models,  we  fitted  three  regression 
models,  competitive  with  the  model  described  in  Section  4  (which  we  denote 
Model  1),  to  each  of  our  72  sets  of  data,  using  both  regression  methods.  The 
exact  structure  of  the  three  models  is  not  important  for  the  present  report,  so 
we  provide  only  rough  descriptions.  The  three  competing  models  differ  from 
Model  1  in  the  structure  of  their  serial-position  and  absolute-position  effects. 

Model  2.  Two  separate  absolute-position  effects  are  contained  in  Model  1, 
one  for  end  elements,  and  the  other  for  interior  elements.  In  Model  2  no 
distinction  is  made  between  these  two  types  of  element:  The  same  absolute- 
position  effect  is  assumed  to  apply  to  both.  As  a  result  there  are  fewer  free 
parameters. 

Model  S.  For  any  array-size  in  Model  1  the  assumed  serial-position  effect  is 
fitted  in  such  a  way  that  the  position  of  the  array  in  the  display  area  is 
irrelevant.  Thus  the  leftmost  element  in  an  array  of  three  elements  that  is 
placed  as  far  to  the  left  in  the  six-element  display  area  as  possible  is  regarded  as 
having  the  same  serial  position  as  the  leftmost  element  in  an  array  that  is 
placed  as  far  to  the  right  as  possible.  In  Model  3,  separate  sets  of  parameters  are 
fitted  to  arrays  of  different  eccentricities  and  to  centered  arrays,  and,  moreover, 
serial  position  is  assigned  symmetrically  for  arrays  in  different  positions,  such 
that  the  leftmost  element  in  an  array  on  the  left,  for  example,  is  regarded  as 
having  the  same  serial  position  as  the  rightmost  element  in  an  array  on  the 
right.  This  model  has  more  free  parameters  than  Model  1. 

Model  4 ■  In  this  Model  we  use  the  simple  absolute-position  structure  of 
Model  2  and  the  complex  serial-position  structure  of  Model  3. 

For  each  of  the  72  data  sets  and  each  regression  type,  each  of  the  three 
competing  models  was  compared  to  Model  1,  using  two  methods,  as  follows:  In 
the  first  method,  we  computed  root  mean  square  residuals  (RMSR)  for  each 
model  and  then  determined  the  number  of  data  sets  favoring  the  most  favored 
model  in  each  pair.  The  second  method  was  the  same,  with  the  RMSR  replaced 
by  the  (more  robust)  median  absolute  residual  (MAR).  Results  are  displayed  in 
Table  6,  where  the  model  whose  number  is  shown  in  boldface  is  the  favored  one, 
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Table  6 

Number  of  72  data  sets  favoring  the  favored  model. 
Statistic:  RMSR  MAR 


Regression: 

Ordinary 

Robust 

Ordinary 

Robust 

Models: 

1.2 

72 

67 

47 

48 

1,3 

61 

59 

51 

51 

1,4 

49 

46 

48 

46 

In  this  table,  higher  numbers  indicate  clearer  choices  between  models.  Of 
the  comparisons  using  the  RMSR,  all  three  favor  ordinary  over  robust 
regression,  but  with  a  difference  that  is  relatively  small.  Of  the  three 
comparisons  using  the  MAR,  robust  and  ordinary  regression  do  about  equally 
well.  More  microscopic  analysis  of  the  data,  in  which  we  considered  these 
comparisons  separately  for  different  delays  and  modalities,  ddded  nothing  to  the 
impression  conveyed  by  Table  6:  For  these  data  and  models  the  regression 
methods  do  not  differ  substantially  in  relation  to  the  criterion  of  clarity  of 
model  comparisons. 

11.  Summary  and  Conclusion 

In  this  report  we  have  advanced  a  notion  about  how  alternative  statistical 
procedures  might  be  compared  in  the  absence  of  adequate  characterization  of 
the  data  in  terms  of  requirements  of  the  procedures  and/or  adequate 
understanding  of  the  procedures.  Our  proposal  is  that  the  criteria  normally 
used  to  select  among  alternative  experimental  procedures  be  applied  in  this 
situation.  These  criteria  are  often  only  implicit,  however,  and  may  well  depend 
on  the  field  of  study  (and  quite  possibly  the  particular  investigator);  hence  in 
Section  2  we  defined  and  discussed  six  such  criteria.  We  then  described  a 
domain  of  data  with  respect  to  which  we  have  thought  about  this  issue,  and 
introduced  the  multiple  regression  model  that  we  have  been  applying  to  these 
data.  We  described  the  two  methods  of  fitting  the  multiple-regression  model 
that  we  have  been  using,  one  being  ordinary  least-squares  regression,  the  other 
Huber’s  "robust"  iteratively  reweighted  least-squares  method.  Finally  we 
reported  our  findings  from  five  tests  in  which  we  applied  the  criteria  defined 
earlier  in  a  comparison  of  the  two  methods:  In  two  of  the  tests  they  did  not 
differ  substantially,  while  three  tests  favored  the  robust  method. 
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